Conversation
📝 WalkthroughWalkthroughAdds optional MLflow logging: new dependency-optional mlflow wrapper, CLI/train args and TrainingArgs fields for MLflow, logger integration to send metrics/params to MLflow, and wiring to initialize/resume and finish MLflow runs during training. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI/main()
participant Train as train()
participant Logger as AsyncStructuredLogger
participant Wrapper as mlflow_wrapper
participant MLflow as MLflow Backend
CLI->>Wrapper: init(tracking_uri, experiment_name, run_name)
Wrapper->>MLflow: start_run()/resume_run()
Wrapper->>Wrapper: store _active_run_id
Wrapper->>MLflow: log_params(params)
CLI->>Train: start training (use_mlflow=True)
Train->>Logger: __init__(..., use_mlflow=True)
Logger->>Wrapper: check_mlflow_available()
Train->>Logger: log(metrics, step)
Logger->>Wrapper: log(data, step)
Wrapper->>Wrapper: _ensure_run_for_logging()
Wrapper->>MLflow: log_metrics(data, step)
Train->>Wrapper: finish()
Wrapper->>MLflow: end_run()
Wrapper->>Wrapper: clear _active_run_id
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/mini_trainer/train.py`:
- Around line 1235-1238: The MLflow enablement check currently only sets
use_mlflow based on mlflow_tracking_uri; change it so use_mlflow is true if any
MLflow-related arg is provided (e.g., mlflow_tracking_uri,
mlflow_experiment_name, mlflow_run_name) so passing --mlflow-experiment-name or
--mlflow-run-name enables MLflow even when tracking URI is omitted; update the
initialization of use_mlflow (the variable referenced in this file) to check all
MLflow args and keep use_wandb logic unchanged.
- Around line 1304-1313: The MLflow init and log_params block (inside the
use_mlflow branch calling mlflow_wrapper.init and mlflow_wrapper.log_params,
currently gated by local_rank == 0 via log_rank_0) must be executed only on the
global rank 0 process to avoid creating one MLflow run per node; change the
guard so that the mlflow_wrapper.init and mlflow_wrapper.log_params calls run
only when the global process rank is 0 (e.g., use torch.distributed.get_rank()
== 0 or an existing global_rank variable) and keep
AsyncStructuredLogger/log_rank_0 behavior unchanged for other logs.
🧹 Nitpick comments (2)
src/mini_trainer/async_structured_logger.py (1)
77-87: Guard rank checks when the process group isn’t initialized.
dist.get_rank()will throw if torch.distributed isn’t initialized. A small guard keeps MLflow logging safe in single-process runs and aligns withlog_sync.♻️ Suggested tweak
- if self.use_wandb and dist.get_rank() == 0: + is_rank0 = not dist.is_initialized() or dist.get_rank() == 0 + if self.use_wandb and is_rank0: wandb_data = {k: v for k, v in data.items() if k != "timestamp"} wandb_wrapper.log(wandb_data) # log to mlflow if enabled, only on MAIN rank - if self.use_mlflow and dist.get_rank() == 0: + if self.use_mlflow and is_rank0: step = data.get("step") mlflow_data = {k: v for k, v in data.items() if k != "timestamp"} mlflow_wrapper.log(mlflow_data, step=step)src/mini_trainer/mlflow_wrapper.py (1)
11-17: Declare MLflow as an optional dependency in pyproject.toml.MLflow is already treated as optional in the code (try/except import pattern), but it's not declared in
[project.optional-dependencies]. Add it alongside the existingwandbentry to make the feature discoverable and enable users to install it withpip install rhai-innovation-mini-trainer[mlflow].
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 33-78: log_params() and log() re-enter an already active MLflow
run using mlflow.start_run(run_id=_active_run_id) which raises unless
nested=True is set; update both functions (log_params and log) so their with
mlflow.start_run(...) calls include nested=True (i.e., with
mlflow.start_run(run_id=_active_run_id, nested=True):) to allow re-entrancy into
the existing run when _active_run_id is set.
- Around line 81-128: log_params and log unconditionally call
mlflow.start_run(run_id=_active_run_id) which raises if a run is already active;
modify both functions (log_params, log) to first call mlflow.active_run() and
only invoke mlflow.start_run(run_id=_active_run_id) when active_run is None or
active_run.info.run_id != _active_run_id, otherwise skip starting a run and
directly call mlflow.log_params / mlflow.log_metrics; reference _active_run_id,
mlflow.active_run(), mlflow.start_run(), mlflow.log_params, and
mlflow.log_metrics to locate where to add the conditional guard.
| def check_mlflow_available(operation: str) -> None: | ||
| """Check if mlflow is available, raise error if not.""" | ||
| if not MLFLOW_AVAILABLE: | ||
| error_msg = ( | ||
| f"Attempted to {operation} but mlflow is not installed. " | ||
| "Please install mlflow with: pip install mlflow" | ||
| ) | ||
| logger.error(error_msg) | ||
| raise MLflowNotAvailableError(error_msg) | ||
|
|
||
|
|
||
| def init( | ||
| tracking_uri: Optional[str] = None, | ||
| experiment_name: Optional[str] = None, | ||
| run_name: Optional[str] = None, | ||
| **kwargs, | ||
| ) -> Any: | ||
| """ | ||
| Initialize an mlflow run. Raises MLflowNotAvailableError if mlflow is not installed. | ||
|
|
||
| Args: | ||
| tracking_uri: MLflow tracking server URI (e.g., "http://localhost:5000") | ||
| experiment_name: Name of the experiment | ||
| run_name: Name of the run | ||
| **kwargs: Additional arguments to pass to mlflow.start_run | ||
|
|
||
| Returns: | ||
| mlflow.ActiveRun object if successful | ||
|
|
||
| Raises: | ||
| MLflowNotAvailableError: If mlflow is not installed | ||
| """ | ||
| global _active_run_id | ||
| check_mlflow_available("initialize mlflow") | ||
| if tracking_uri: | ||
| mlflow.set_tracking_uri(tracking_uri) | ||
| if experiment_name: | ||
| mlflow.set_experiment(experiment_name) | ||
| run = mlflow.start_run(run_name=run_name, **kwargs) | ||
| _active_run_id = run.info.run_id | ||
| return run | ||
|
|
||
|
|
||
| def get_active_run_id() -> Optional[str]: | ||
| """Get the active run ID that was started by init().""" | ||
| return _active_run_id |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "mlflow_wrapper.py" -type fRepository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 119
🏁 Script executed:
cat -n src/mini_trainer/mlflow_wrapper.py | head -150Repository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 5266
🌐 Web query:
MLflow start_run nested calls re-entrance active run behavior documentation
💡 Result:
-
Default (non-nested) behavior: If there’s already an active run in the current thread, calling
mlflow.start_run()again raises an exception (“Run with UUID … is already active … end_run() … or start_run(nested=True)”). This is enforced by checking the thread-local active run stack and requiringnested=Trueto start another run while one is active. [1] -
Nested runs (
nested=True):mlflow.start_run(nested=True)creates a child run under the current active run (the parent is taken from the top of the active-run stack) and pushes the new run onto the stack; child/parent linkage is recorded via themlflow.parentRunIdtag. [1], [2] -
Explicit parent (
parent_run_id=..., newer API): You can start a run nested under a specific parent viastart_run(parent_run_id=...). The parent must be ACTIVE, and if there’s already an active run, the providedparent_run_idmust match the current active run’s ID (otherwise it errors). [1] -
“Active run” semantics (re-entrance / stack):
mlflow.active_run()returns the top of a thread-local stack (the most recently started, not-yet-ended run in the current thread). [2]mlflow.end_run()pops the top run from that stack and terminates it. [3]- When using
with mlflow.start_run() as run: ..., the context manager’s__exit__ends the run only if it’s still present in the active-run stack, checking by run ID (not object identity) to handle cases where tools stop and restart a run with the same ID (a practical “re-entrance” safeguard). [4]
Sources:
[1] MLflow latest mlflow.tracking.fluent.start_run implementation/docs (module source) (mlflow.org)
[2] MLflow latest mlflow.tracking.fluent.active_run docs/implementation (mlflow.org)
[3] MLflow latest mlflow.tracking.fluent.end_run docs/implementation (mlflow.org)
[4] MLflow latest ActiveRun.__exit__ behavior (module source) (mlflow.org)
Citations:
- 1: https://mlflow.org/docs/latest/api_reference/_modules/mlflow/tracking/fluent.html
- 2: https://mlflow.org/docs/latest/api_reference/_modules/mlflow/tracking/fluent.html
- 3: https://mlflow.org/docs/latest/api_reference/_modules/mlflow/tracking/fluent.html
- 4: https://mlflow.org/docs/latest/api_reference/_modules/mlflow/tracking/fluent.html
Add nested=True to start_run() calls in log_params() and log() functions
The log_params() (line 96) and log() (line 125) functions attempt to re-enter the active run via mlflow.start_run(run_id=_active_run_id), but MLflow raises an exception when start_run() is called without nested=True while a run is already active. This breaks the normal usage flow: init() → log_params()/log() → finish().
Pass nested=True to both start_run() calls to explicitly allow re-entrance into the same run:
Fix
# In log_params() - line 96
if _active_run_id:
with mlflow.start_run(run_id=_active_run_id, nested=True):
mlflow.log_params(str_params)
# In log() - line 125
if _active_run_id:
with mlflow.start_run(run_id=_active_run_id, nested=True):
mlflow.log_metrics(metrics, step=step)🤖 Prompt for AI Agents
In `@src/mini_trainer/mlflow_wrapper.py` around lines 33 - 78, log_params() and
log() re-enter an already active MLflow run using
mlflow.start_run(run_id=_active_run_id) which raises unless nested=True is set;
update both functions (log_params and log) so their with mlflow.start_run(...)
calls include nested=True (i.e., with mlflow.start_run(run_id=_active_run_id,
nested=True):) to allow re-entrancy into the existing run when _active_run_id is
set.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 96-110: The log_params function currently assumes an active mlflow
run; update mlflow_wrapper.log_params to mirror the async-safe pattern used in
log(): after calling check_mlflow_available("log params to mlflow"), detect
mlflow.active_run() and if it's None use the module-level fallback
_active_run_id to call mlflow.start_run(run_id=_active_run_id) (or use
mlflow.log_params with the run context started) so parameters are logged against
the correct run when thread-local context is lost; ensure you still convert
params to strings before logging and maintain the same error behavior.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 110-119: The current logging uses
mlflow.start_run(run_id=_active_run_id) which as a context manager will call
mlflow.end_run() and finish the run; instead, when active_run is None but
_active_run_id is set, use MlflowClient to log directly by run id (e.g., replace
the with mlflow.start_run(...) blocks around mlflow.log_params and the similar
block for metrics) by calling MlflowClient().log_params(run_id=_active_run_id,
params=str_params) and MlflowClient().log_metric(s)/log_batch for metrics, so
the stored run (_active_run_id) is not ended prematurely and future logs
continue to target the same run.
🧹 Nitpick comments (1)
src/mini_trainer/mlflow_wrapper.py (1)
45-88: Add a guard to reuse an existing active run and makeinit()idempotent.
When a run is already active,mlflow.start_run()raises an exception. Checking for an existing run and reusing it prevents hard failures and handles the common case whereinit()is called multiple times.♻️ Suggested guard for existing active runs
- run = mlflow.start_run(run_name=run_name, **kwargs) - _active_run_id = run.info.run_id - return run + active_run = mlflow.active_run() + if active_run and not kwargs.get("nested") and not kwargs.get("parent_run_id"): + _active_run_id = active_run.info.run_id + return active_run + + run = mlflow.start_run(run_name=run_name, **kwargs) + _active_run_id = run.info.run_id + return runThe guard preserves nesting semantics: if the caller explicitly requests a nested or child run (
nested=Trueorparent_run_id), a new run is created; otherwise, the existing active run is reused.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 45-90: The init function can raise TypeError if callers include
run_name in kwargs as well as the explicit run_name parameter; before calling
mlflow.start_run, pop and prefer the explicit run_name from kwargs (e.g., if
"run_name" in kwargs: kwargs.pop("run_name")) so the explicit run_name argument
takes precedence and no duplicate keyword is passed to mlflow.start_run; update
the call site (mlflow.start_run(run_name=run_name, **kwargs)) in init to use the
sanitized kwargs.
🧹 Nitpick comments (1)
src/mini_trainer/mlflow_wrapper.py (1)
98-125: DRY the active-run resolution for params/metrics.The active-run branching is duplicated in two places. A small helper reduces drift risk while preserving behavior.
♻️ Optional refactor
+def _ensure_run_for_logging() -> None: + active_run = mlflow.active_run() + if not active_run and _active_run_id: + # No active run in this thread but we have a stored run ID + mlflow.start_run(run_id=_active_run_id) + def log_params(params: Dict[str, Any]) -> None: @@ - # Check if there's an active run (same pattern as log() for async safety) - active_run = mlflow.active_run() - if active_run: - # Run is active, log directly - mlflow.log_params(str_params) - elif _active_run_id: - # No active run in this thread but we have a stored run ID - resume it - # This can happen in async contexts where thread-local context is lost - # Note: We don't use context manager here because it would end the run on exit - mlflow.start_run(run_id=_active_run_id) - mlflow.log_params(str_params) - else: - # No run context at all - log anyway (MLflow will create a run) - mlflow.log_params(str_params) + _ensure_run_for_logging() + mlflow.log_params(str_params) def log(data: Dict[str, Any], step: Optional[int] = None) -> None: @@ if metrics: - # Check if there's an active run - active_run = mlflow.active_run() - if active_run: - # Run is active, log directly - mlflow.log_metrics(metrics, step=step) - elif _active_run_id: - # No active run in this thread but we have a stored run ID - resume it - # This can happen in async contexts where thread-local context is lost - # Note: We don't use context manager here because it would end the run on exit - mlflow.start_run(run_id=_active_run_id) - mlflow.log_metrics(metrics, step=step) - else: - # No run context at all - log anyway (MLflow will create a run) - mlflow.log_metrics(metrics, step=step) + _ensure_run_for_logging() + mlflow.log_metrics(metrics, step=step)Also applies to: 128-161
- Add mlflow_wrapper.py for optional MLflow imports with error handling - Add mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name to TrainingArgs - Update AsyncStructuredLogger to support MLflow logging - Add MLflow CLI args to api_train.py and train.py - Initialize MLflow at start, log params, finish at end of training Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Enable MLflow when any MLflow arg is provided (not just tracking_uri) - Only init/finish MLflow on global rank 0 to avoid multiple runs in multi-node Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Store the run ID when initializing MLflow and use it explicitly when logging params/metrics. This fixes an issue where async logging would lose the thread-local run context and create a separate MLflow run. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous fix incorrectly tried to start a run in log_params() when the run was already active from init(). Now log_params() logs directly since the run is already active, and log() only resumes the run if it's not currently active (for async contexts). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Guard dist.get_rank() when process group isn't initialized in async log() - Add mlflow as optional dependency in pyproject.toml Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement kwarg > env var precedence for mlflow_tracking_uri and mlflow_experiment_name, matching the behavior of instructlab-training. The configuration now follows this precedence: 1. Explicit kwargs (highest priority) 2. Environment variables (MLFLOW_TRACKING_URI, MLFLOW_EXPERIMENT_NAME) 3. MLflow defaults (lowest priority) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mirror the pattern used in log() to handle cases where thread-local MLflow context is lost in async contexts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Using `with mlflow.start_run(run_id=...)` as a context manager ends the run when the block exits, breaking subsequent logging calls. Changed to call start_run() without context manager to keep the run active. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
8b4a147 to
33fa513
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/mini_trainer/wandb_wrapper.py (1)
27-35:⚠️ Potential issue | 🟡 MinorInstall hint should match the actual distribution name.
Line 32 references
mini-trainer[wandb], butpyproject.tomldefines the distribution name asrhai-innovation-mini-trainer. Users following this hint will fail to install the package. Update to the correct distribution name:🔧 Suggested update
- "Please install wandb with: pip install 'mini-trainer[wandb]'" + "Please install wandb with: pip install 'rhai-innovation-mini-trainer[wandb]'"
🤖 Fix all issues with AI agents
In `@src/mini_trainer/api_train.py`:
- Around line 157-165: The MLflow experiment/run name flags are currently only
added when train_args.mlflow_tracking_uri is set; update the logic in
src/mini_trainer/api_train.py so that the appending of --mlflow-experiment-name
and --mlflow-run-name is done independently of the --mlflow-tracking-uri block:
keep the existing append of
f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}" inside the if
train_args.mlflow_tracking_uri: block, but move the if
train_args.mlflow_experiment_name: and if train_args.mlflow_run_name: checks out
of that block so they append their flags to the command list even when
mlflow_tracking_uri is None (consistent with use_mlflow = any([...])).
In `@src/mini_trainer/mlflow_wrapper.py`:
- Around line 28-42: The error message in check_mlflow_available currently
suggests installing "mini-trainer[mlflow]" which doesn't match the package name;
update the error_msg string used in check_mlflow_available (and raised via
MLflowNotAvailableError) to instruct installing
"rhai-innovation-mini-trainer[mlflow]" instead, keeping the same phrasing and
logger.error call and ensuring MLFLOW_AVAILABLE and MLflowNotAvailableError
usage remain unchanged.
- Around line 45-99: The current init function unconditionally does
kwargs.pop("run_name", None), which discards a caller-supplied run name passed
via **kwargs; change the logic so that if the explicit parameter run_name is
None and "run_name" exists in kwargs, promote it by assigning run_name =
kwargs.pop("run_name"), otherwise (when run_name is explicitly provided) remove
"run_name" from kwargs to avoid duplicate arguments before calling
mlflow.start_run; update references around init, run_name, kwargs.pop and
mlflow.start_run accordingly.
🧹 Nitpick comments (2)
src/mini_trainer/async_structured_logger.py (1)
78-91: Avoid loggingsteptwice to MLflow.
stepis already supplied as the step argument; leaving it inmlflow_datarecords a redundantstepmetric. Consider filtering it out.♻️ Suggested tweak
- mlflow_data = {k: v for k, v in data.items() if k != "timestamp"} + mlflow_data = { + k: v for k, v in data.items() if k not in ("timestamp", "step") + }pyproject.toml (1)
65-65: MLflow dependency needs a minimum version constraint for Python 3.12 compatibility.Current MLflow releases (3.x series) fully support Python 3.11–3.12. However, the declaration
mlflow = ["mlflow"]lacks a version constraint and could allow installation of older MLflow 2.x versions, which had Python 3.12 compatibility issues. Add a minimum version constraint such asmlflow = ["mlflow>=3.0"]to ensure users always get a compatible release.
| # mlflow-related arguments | ||
| if train_args.mlflow_tracking_uri: | ||
| command.append(f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}") | ||
| if train_args.mlflow_experiment_name: | ||
| command.append( | ||
| f"--mlflow-experiment-name={train_args.mlflow_experiment_name}" | ||
| ) | ||
| if train_args.mlflow_run_name: | ||
| command.append(f"--mlflow-run-name={train_args.mlflow_run_name}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n src/mini_trainer/api_train.py | sed -n '150,175p'Repository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 1447
🏁 Script executed:
# Get broader context around the MLflow configuration
cat -n src/mini_trainer/api_train.py | sed -n '140,180p'Repository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 2126
🏁 Script executed:
# Check if there are other MLflow references or documentation in the repo
rg -i "mlflow" --type py -A 2 -B 2 src/mini_trainer/api_train.py | head -80Repository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 628
🏁 Script executed:
# Find where the MLflow CLI arguments are parsed/handled
rg "mlflow-tracking-uri|mlflow-experiment-name|mlflow-run-name" --type py -B 3 -A 3Repository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 988
🏁 Script executed:
# Check the training script that receives these arguments
fd "train\.py" src/ -type fRepository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 253
🏁 Script executed:
# Look for MLflow initialization or argument parsing in the codebase
rg -i "mlflow" --type py -B 2 -A 2 | head -100Repository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 6208
🏁 Script executed:
cat src/mini_trainer/mlflow_wrapper.pyRepository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 5599
MLflow experiment and run name args should be independent of tracking URI.
Lines 160–165 only append --mlflow-experiment-name and --mlflow-run-name when mlflow_tracking_uri is set. However, the training code enables MLflow if any of the three MLflow args are provided (use_mlflow = any([mlflow_tracking_uri, mlflow_experiment_name, mlflow_run_name])), and MLflow supports using the default local backend without an explicit tracking URI. Move these two conditionals outside the mlflow_tracking_uri block so they append independently.
🛠️ Suggested fix
- if train_args.mlflow_tracking_uri:
- command.append(f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}")
- if train_args.mlflow_experiment_name:
- command.append(
- f"--mlflow-experiment-name={train_args.mlflow_experiment_name}"
- )
- if train_args.mlflow_run_name:
- command.append(f"--mlflow-run-name={train_args.mlflow_run_name}")
+ if train_args.mlflow_tracking_uri:
+ command.append(f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}")
+ if train_args.mlflow_experiment_name:
+ command.append(
+ f"--mlflow-experiment-name={train_args.mlflow_experiment_name}"
+ )
+ if train_args.mlflow_run_name:
+ command.append(f"--mlflow-run-name={train_args.mlflow_run_name}")🤖 Prompt for AI Agents
In `@src/mini_trainer/api_train.py` around lines 157 - 165, The MLflow
experiment/run name flags are currently only added when
train_args.mlflow_tracking_uri is set; update the logic in
src/mini_trainer/api_train.py so that the appending of --mlflow-experiment-name
and --mlflow-run-name is done independently of the --mlflow-tracking-uri block:
keep the existing append of
f"--mlflow-tracking-uri={train_args.mlflow_tracking_uri}" inside the if
train_args.mlflow_tracking_uri: block, but move the if
train_args.mlflow_experiment_name: and if train_args.mlflow_run_name: checks out
of that block so they append their flags to the command list even when
mlflow_tracking_uri is None (consistent with use_mlflow = any([...])).
| class MLflowNotAvailableError(ImportError): | ||
| """Raised when mlflow functions are called but mlflow is not installed.""" | ||
|
|
||
| pass | ||
|
|
||
|
|
||
| def check_mlflow_available(operation: str) -> None: | ||
| """Check if mlflow is available, raise error if not.""" | ||
| if not MLFLOW_AVAILABLE: | ||
| error_msg = ( | ||
| f"Attempted to {operation} but mlflow is not installed. " | ||
| "Please install mlflow with: pip install 'mini-trainer[mlflow]'" | ||
| ) | ||
| logger.error(error_msg) | ||
| raise MLflowNotAvailableError(error_msg) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "pyproject.toml" -type f | head -5Repository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 99
🏁 Script executed:
cat pyproject.tomlRepository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 2785
🏁 Script executed:
sed -n '28,42p' src/mini_trainer/mlflow_wrapper.pyRepository: Red-Hat-AI-Innovation-Team/mini_trainer
Length of output: 618
Align MLflow install hint with distribution name.
Line 39 references mini-trainer[mlflow], but the distribution name in pyproject.toml is rhai-innovation-mini-trainer. Update the error message to match.
🔧 Suggested update
- "Please install mlflow with: pip install 'mini-trainer[mlflow]'"
+ "Please install mlflow with: pip install 'rhai-innovation-mini-trainer[mlflow]'"🤖 Prompt for AI Agents
In `@src/mini_trainer/mlflow_wrapper.py` around lines 28 - 42, The error message
in check_mlflow_available currently suggests installing "mini-trainer[mlflow]"
which doesn't match the package name; update the error_msg string used in
check_mlflow_available (and raised via MLflowNotAvailableError) to instruct
installing "rhai-innovation-mini-trainer[mlflow]" instead, keeping the same
phrasing and logger.error call and ensuring MLFLOW_AVAILABLE and
MLflowNotAvailableError usage remain unchanged.
Maxusmusti
left a comment
There was a problem hiding this comment.
LGTM, all my review comments have been addressed
Summary
mlflow_wrapper.pyfor optional MLflow imports with error handlingmlflow_tracking_uri,mlflow_experiment_name,mlflow_run_nameto TrainingArgsAsyncStructuredLoggerto support MLflow loggingapi_train.pyandtrain.pyTest plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores